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DESCRIPTION 

MULTIPLEXER AND DEMULTIPLEXER 

CROSS-REFERENCE TO RELATED APPLICATIONS 

This application is the National Stage of International 

Application No. PCT/JP03/07639. filed June 17, 2003. 



Techn i cal F i eld 

BACKGROUND OF THE INVENTION 
10 1. Field of the Invention 

The present invention relates to a multiplexer that 
multiplexes media data such as video data, audio data and the like 
and a demultiplexer that reads and demultiplexes a bit string 
where media data such as video data, audio data and the like are 
15 multiplexed. 



Backg r ound A r t 2. Description of the Related Art 

The recent increase in capacity of a communication network 
and the development of a transmission technique has remarkably 

20 popularized the online video distribution service of distributing a 
video file of a multimedia content including a-video, audio, a-text, 
a still picture and the like to a personal computer. Also, the third 
generation partnership project (3GPP) that is an international 
standardization group which has an object to standardize the 

25 standards of the so-called third-generation mobile communication 
systems such as mobile terminals are seen making a movement of 
defining the transparent end-to-end packet switched streaming 
service (TS26.234) as a standard related to a wireless video 
distribution, and the video distribution service is expected to be 

30 further provided to mobile communication terminals such as 
mobile phones and PDAs. 

When distributing a video file in the video distribution 
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service, a multiplexer reads media data such as a video, a still 
picture, audio, a— text and the like and multiplexes header 
information necessary for playing back the media data and the 
entity data of the media data so as to generate a-video file data. 
5 As a multiplex file format of this video file data, an MP4 file format 
is focused on. 

This MP4 file format is the multiplex file format which is 
under standardization by the international standardization 
organization/international engineering consortium (ISO) 
10 JTC1/SC29/WG11 that is the international standardization group 
and expected to become widely spread because it is also employed 
by the TS26.234 of the above-mentioned 3GPP. 

Here, the data structure of the MP4 file will be explained. 
The MP4 file stores the header information and the entity 
15 data of media data on a basis of an object called box and is made 
up of plural boxes that are arranged hierarchically. 

FIG. 1 is a diagram for explaining the structure of a box 
included in a conventional MP4 file. 

The box 901 is made of a box header part 902 where the 
20 header information of the box 901 is stored and a box data storage 
part 903 where data included in the box 901 (such as a sub-box of 
the box and a field for describing the information) is stored. 

This box header part 902 has fields of a box size 904, a box 
type 905, a version 906 and a flag 907. 
25 The box size 904 is the field describing the size information 

of the whole box 901, including the byte size assigned for this field. 

The box type 905 is the field describing the identifier for 
identifying the type of the box 901. This identifier is generally 
presented by four alphabet strings. Note that there are cases 
30 where each box is shown by using this identifier in this 
specification. 

The version 906 is the field where a version number showing 
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the version of the box 901 is described, and the flag 907 is the field 
describing flag information that is set for each box 901. This 
version 906 and the flag 907 are not always necessary for all boxes 
901, and a box 901 that does not have these fields may exist. 
5 The MP4 file made of a series of boxes 901 that has this 

structure can be broadly divided into a basic part that is essential 
in the file structure and an extension part that is used as a need 
arises. First, the basic part of the MP4 file will be explained. 

FIG. 2 is a diagram for explaining the basic part of a 

10 conventional MP4 file. 

The basic part 911 of the MP4 file 910 is made of a file 
header part 912 and a file data part 913. 

The file header part 912 is the part where header 
information of the whole file such as the information on a video 

15 data compression coding method and the like of video data is 
stored and is made of a file type box 914 and a movie box 915. 

The file type box 914 is a box identified by the identifier 
"ftyp" and stores the information for identifying the MP4 file. As 
the standardization group or a service provider can arbitrary 

20 arbitrarily prescribe which media data is stored in the MP4 file and 
which compression coding method is used for the video data, the 
audio data and the like that is stored in the MP4 file, the 
information for identifying the prescription according to which the 
MP4 file is generated is stored in this file type box 914. 

25 The movie box 915 is the box identified as the identifier 

"moov" and stores header information of the entity data stored in 
the file data part 913 such as a display duration. 

The file data part 913 is made of a movie data box 916 
identified as the identifier "mdat". Note that it is also possible to 

30 refer to an external file that is different from this MP4 file 910 
instead of this file data part 913. In this way, in the case of 
referring to the external file, the basic part 911 of the MP4 file 910 
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is made essentially of the file header part 912. In this 
specification, the case where entity data is included in the MP4 file 
910 will be explained, not the case where this external file is 
referred to. 

5 The movie data box 916 is a box for storing the entity data 

of the media data on a basis of a unit called sample. This sample 
is a smallest access unit in the MP4 file and corresponds to a video 
object plane (VOP) of the video data coded in a compression coding 
method of the moving picture experts group 4 visual (MPEG) or a 

10 frame of the audio data. 

Here, the lower hierarchy in the structure of the movie box 
915 in the basic part of a conventional MP4 file will be explained. 

FIG. 3 is a diagram for explaining the structure of the movie 
box in the conventional MP4 file. 

15 As shown in FIG. 3A, the movie box 915 is made of the box 

header part 902 and the box data storage part 903 that have 
already been explained. And, the size information of the movie 
box 915 is described ("xxxx" in FIG. 3A) in the field of the box size 
904 that constitutes the box header part 902, and the identifier 

20 "moov" of the movie box 915 is described in the field of the box 
type 905. 

Also, the movie header box 917 where the header 
information of the basic part 911 of the MP4 file 910 is stored or the 
track box 918 where the header information for each track such as 

25 the video track and the audio track is stored in the box data storage 
part 903 of the movie box 915. Note that a track here means the 
whole sample data of each media included in the MP4 file 910, and 
the track of a video, audio, a text or the like is called as a video 
track, an audio track, a text track or the like respectively. Also, in 

30 the case where a plurality of data of the same media are included 
in the MP4 file 910, a plurality of tracks exist in the same media. 
Specifically explaining, in an example case where two types of 
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video data are included in the MP4 file 910, two video tracks exist. 

The movie header box 917 is made of the box header part 
902 and the box data storage part 903 that have already been 
explained, the size information of the movie header box 917 is 
5 described ("xxx" in FIG. 3A) in the field of the box size 904 that 
constitutes the box header part 902, and the identifier "mvhd" of 
the movie header box 917 is described in the field of the box type 
905. And, information on the duration needed for playing back 
the content included in the basic part 911 of the MP4 file 910 and 
10 the like is stored in the box data storage part 903 of the movie 
header box 917. 

Also, the size information of the track box 918 ("xx" in FIG. 
3A) is described in the field of the box size 904 that constitutes the 
box header part 902 of the track box 918, the identifier "track" of 
15 the track box 918 is described in the field of the box type 905. 
And, the track header box 919 is stored in the box data storage 
part 903 of the track box 918. 

The track header box 919 is the box that has a field for 
describing the header information for each track and is identified 
20 by the identifier "tkhd". The field for describing a track ID for 
identifying the track type or the information on the duration 
needed for playing back the track is described in the box data 
storage part 903 of this track header box 919. 

In this way, boxes 901 are arranged hierarchically in the 
25 movie box 915, and header information for each track for a video, 
audio or the like is stored in the track box 918 that can be identified 
by "trak". And, header information on a basis of a track sample is 
stored in the lower boxes included in this track box 918. 

When showing the structure of the movie box 915 shown in 
30 FIG. 3A as a tree, a diagram like FIG. 3B can be obtained. 

In other words, it is shown that a movie header box 917 and 
a track box 918 are arranged as a group of lower boxes of the 
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movie box 915, a track header box 919 is arranged as a group of 
lower box of the track box 918, and boxes 901 are arranged 
hierarchically. 

At the initial stage of standardizing the MP4 file format, the 
5 MP4 file 910 is made essentially of the above-mentioned basic part 
911. However, the increase in the information amount of media 
data entails the increase in the file size, which produces various 
problems such as the difficulty in the application for streaming 
playback, and thus an improvement of additionally using an 
10 extension part where a plurality of combinations of a header box 
and a data box are serially arranged. 

FIG. 4 is a diagram showing the structure of a conventional 
MP4 file including an extension part. 

As shown in FIG. 4, the MP4 file 920 to which the 
15 above-mentioned improvement is added is made of a basic part 
911 and an extension part 921. The MP4 file 920 including this 
extension part 921 can store all of the media data in the extension 
part 921, it is possible to omit the movie data box 916 of the MP4 
file basic part 911. 
20 The extension 921 is made of a plurality of packets 922 that 

is divided on a basis of predetermined part. 

This packet 922 is made of a pair of a movie fragment box 
923 and a movie data box 916, and also called as movie fragment. 
The movie data box 916 stores a sample for each track on a 
25 basis of the above-mentioned predetermined part. The movie 
fragment box 923 is the box for storing the header information 
corresponding to this movie data box 916 and identified by the 
identifier "moof". The structure of this movie fragment box 923 
will be explained more specifically. 
30 FIG. 5 is a diagram for explaining the structure of a 

conventional movie fragment box. 

As shown in FIG. 5, a movie fragment header box 924 and a 
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plurality of track fragment boxes 925 are stored in the box data 
storage unit 903 of the movie fragment box 923. 

The movie fragment header box 924 is the box identified by 
the identifier "mfhd" and stores the header information of the 
5 whole movie fragment box 923. 

The track fragment box 925 is the box identified by the 
identifier "traf" and stores the header information for each track. 

Note that a single track fragment box 925 is generally 
prepared for the header information of a single track, but it is also 
10 possible to prepare a plurality of track fragment boxes 925 for a 
single track header information. In this way, when a single track 
header information is divided into a plurality of track fragment 
boxes 925 so as to be stored, decoding time of the leading sample 
of the track fragment box 925 is arranged in an ascending order. 
15 After that, a track fragment header box 926 and one or more 

track fragment run box 927 are stored in the box data storage part 
903 of this track fragment box 925. 

The track fragment header box 926 is the box identified by 
the identifier "tfhd" and stores a field for describing the track ID for 
20 identifying the type of a track or information on the default value 
such as the playback time of a sample and the like. 

The track fragment run box 927 is the box identified by the 
identifier "trun" and stores the header information on a basis of a 
sample. This track fragment run box 927 will be explained with 
25 reference to FIG. 6. 

FIG. 6 is a diagram for explaining the structure of a 
conventional track fragment run box 927. 

The flag 907 is the field describing flag information set for 
each box 901, here the flag information showing whether each field 
30 from the data offset 929 to the sample composition time offset 936 
is included in the track fragment run box 927 next to the flag 907. 
The sample count 928 is the field describing the information 
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showing the number of header information items concerning the 
sample is stored in the track fragment run box 927. 

The data offset 929 is the field describing the pointer 
information showing in which part of the movie data box 916 paring 
5 with the entity data of the sample placed at the leading part of the 
track fragment run box 927 among the samples whose header 
information items are stored in the track fragment run box 927. 

The leading sample flag 930 is the field where the value of 
the filed of the later-explained sample flag 935 is overwritten in the 

10 case where the leading sample of the track fragment run box 927 is 
a randomly-accessible sample. Here, the random access means 
the processing operation of moving the playback location of data in 
the middle of the playback to the location 10 minutes later or 
starting the playback from the point in the middle of the data in a 

15 playback apparatus of the MP4 file. In addition, the 
randomly-accessible sample is the sample, among video samples, 
that constitutes a frame that can be solely decoded without 
referring to other frame data, that is an intra coded frame 
(so-called an intra frame) in the playback apparatus of the MP4 file. 

20 Note that all the audio samples are the samples that are randomly 
accessible because all of the audio samples can be solely decoded. 

The table 931 is the one where the same number of entries 
932 showing the header information items for respective samples 
as the number of entries shown in the sample count 928 is 

25 integrated. 

The entry 932 is a collection of fields showing header 
information items for respective samples, and the included field is 
indicated by the above-mentioned flag 907. Fields included in the 
entry 932 includes a sample duration 933 describing a sample 

30 playback duration, a sample size 934 describing a sample size, a 
sample flag 935 describing the flag information indicating whether 
the sample is randomly accessible or not, and a sample 
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composition time offset 936 describing the differential value 
between the sample decoding time and the sample display time in 
order to handle samples using an interactive prediction. 

Note that, these fields are not included in the entry 932, as 
5 default values of these fields are described in the track fragment 
header box 926 or the movie extend box (identifier "mvex") in the 
movie fragment box 915, these default value of the fields are used 
for each of the sample header information items. 

Also, the header information items of samples are described 

10 in the track fragment run box 927 in the order of decoding time. 
Therefore, at the time when the apparatus that plays back the MP4 
file searches the sample header information items, referring to 
track IDs in the track fragment header box 926 starting from the 
track fragment box 925 that is the leading box in the file means 

15 searching the track fragment box 925 including the header 
information item of the track to be obtained and searching the 
header information of a sample starting from the track fragment 
run box 927 that is the leading box in the track fragment box 925. 
Note that, in the case of the MP4 file 920 including this 

20 extension part 921, the information necessary for the whole track 
such as the initial information at the time of decoding is stored in 
the movie box 915. 

Next, the structure example of the MP4 file including the 
extension 921 having the structure like this will be explained. 

25 FIG. 7 is a diagram showing the structure example of the 

extension part of the MP4 file including the conventional extension 
part. 

In FIG. 7, the storage method of a content will be explained 
showing two examples, and the content playback duration is 60 
30 seconds. 

The MP4 file 940 shown as FIG. 7A has the structure of 
storing media data in both the basic part 941 and the extension 
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part 942. In other words, a part of the media data from 0 to 30 
seconds is stored in the mdat_l (code 945) of the basic part 941, 
a part of the media data from 30 to 45 seconds is stored in the 
mdat_2 (code 947) of the extension part 942, and a part of the 
5 media data from 45 to 60 seconds is stored in the mdat_3 (code 
949). In addition, the header information of mdat_l (code 945) is 
stored in moov 944, the header information of mdat_2 (code 947) 
is stored in the moof_l (code 946) and the header information of 
mdat_3 (code 949) is stored in the moof_2 (code 948). 

10 In contrast, the MP4 file 950 shown in FIG. 7B has the 

structure of storing the media data in the extension part 952 only. 
In other words, the basic part 951 is made of ftyp 953 and moov 
954 and does not include any mdat, a part of media data from 0 to 
30 seconds is stored in mdat_l (code 956) in the extension part 

15 952, and a part of the media data from 30 to 60 seconds is stored 
in mdat_2 (code 958). In addition, the header information of 
mdat_l (code 956) is stored in moof_l (code 955), and the header 
information of mdat_2 (code 958) is stored in moof_2 (code 957). 
Here, how the extension part of the above-mentioned MP4 

20 file is generated will be explained with reference to FIG. 8 to FIG. 
10. 

FIG. 8 is a block diagram showing the structure of the 
conventional multiplexer. 

The multiplexer 960 is an apparatus that multiplexes the 
25 media data and generates the extension part data of the MP4 file. 
Here, the extension part data of the MP4 file is generated by 
multiplexing video data and audio data. 

The first input unit 961 captures video data in the 
multiplexer 960 and has the first data storage unit 962 store the 
30 video data. Also, the second input unit 964 captures audio data in 
the multiplexer 960 and has the second data storage unit 965 to 
store the audio data. 
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The first analysis unit 963 reads out samples of video data 
items one by one from the first data storage unit 962 so as to 
analyze them and outputs the header information items of the 
video samples to the packetization part determination unit 967. 
5 Also, the second analysis unit 966 reads out samples of audio data 
one by one from the second data storage unit 965 so as to analyze 
them and outputs the header information items of the audio 
samples to the packetization part determination unit 967. T-h4s- 
The header information items of video samples and the header 
10 information items of audio samples include the information 
indicating the size or the playback durations of the samples, and 
the header information items of video samples include the 
information items showing whether the video samples are intra 
frames or not. 

15 The packetization part determination unit 967 determines 

the packetization part of the video data and the audio data so that 
the number of samples included in the packet become constant and 
generates the header information items of the respective packets 
based on the obtained sample header information items. 

20 FIG. 9 shows the processing operation flow of the 

conventional packetization part determination unit. Here, the 
number of samples stored in a packet is N, and the predetermined 
number of N is stored in a memory or the like of the multiplexer 
960. 

25 First, when the first analysis unit 963 obtains a video sample 

(S901) and outputs the video sample header information to the 
packetization part determination unit 967, the packetization part 
determination unit 967 adds a video sample header information to 
a packet generation table (S902). 

30 Next, the packetization part determination unit 967 updates 

the number of video samples included in the packet (S903) and 
judges whether the number of the video samples included in the 
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packet becomes N or not (S904). 

Here, the above-mentioned processing from S901 to S903 is 
repeated in the case where the number of video samples included 
in the packet does not reach N (No in S904), and the packetization 
5 part determination unit 967 packetizes N video samples to finish 
the processing operation (S905). 

Likewise, the packetization part determination unit 967 
packetizes the audio samples by performing the processing 
operation of the above-mentioned S901 to S905. 

10 After that, the packetization part determination unit 967 

repeats the processing operation of this flow until all the samples 
have been packetized. 

FIG. 10 shows an example of the packet generation table 
that stores the header information items of the conventional video 

15 samples. This packet generation table 968a describes, for each of 
the video samples, the sizes of samples, the sample playback 
durations, or the information related to the intra coded frame flags 
showing whether the video samples are intra frames or not. Here, 
the leading video sample stored in the packet shows that the size 

20 is 300 bytes, the playback duration is 30ms, and that it is not the 
intra coded frame. And, the second video sample shows that it is 
the intra coded frame. In addition, this packet generation table 
968a is outputted to the packet generation table storage unit 968 
at the time when these information items are added in sequence in 

25 the packetization part determination unit 967 until w N"th sample 
that is the sample included in a packet is generated. 

Referring to FIG. 8 again, next, the packetization part 
determination unit 967 describes the header information items of N 
samples in the packet generation table 968a, and then it outputs 

30 the packet generation table 968a to the packet generation table 
storage unit 968 and a packet generation signal to the packet 
header generation unit 969. 
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The packet header generation unit 969, when obtaining the 
packet generation signal, reads out the packet sample header 
information from the packet generation table 968a that is held in 
the packet generation table storage unit 968 and generates moof 
5 data. Also, the packet header generation unit 969 outputs the 
generated moof data to the packet connection unit 971 and outputs, 
to the packet data generation unit 970, the mdat information 
including ( i ) pointer information indicating which parts of the first 
data storage unit 962 and the second data storage unit 965 store 
10 the entity data items of samples included in the packet and ( ii ) the 
size information items of samples. 

The packet data generation unit 970 reads out the entity 
data items of samples from the first data storage unit 962 and the 
second data storage unit 965 based on the obtained mdat 
15 information so as to generate mdat data and outputs the mdat data 
to the packet connection unit 971. 

After that, the packet connection unit 971 connects the moof 
data with the mdat data so as to output the data in the mp4 
extension part for a single packet. 
20 Finally, the outputted mp4 extension data for a single packet 

is captured into an apparatus that generates the MP4 file and the 
data of the mp4 extension part that is generated in sequence are 
arranged in sequence so that the extension part of the MP4 file is 
generated. After that, this file generation apparatus connects the 
25 basic part with the extension part of the MP4 file so as to generate 
an MP4 file. 

However, at the time when the extension part of the MP4 file 
that is multiplexed by the conventional multiplexer like this is 
played back, there are problems listed below. 
30 As a conventional demultiplexer multiplexes data without 

considering the playback start time of samples included in the 
packet, there is a case where an audio sample that is synchronized 
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with the video sample which has certain playback time is stored in 
a packet that is different from the packet in the case of video 
samples. Therefore, this is the cause of a problem that the 
efficiency of the data access in playing back an MP4 file by the 
5 playback apparatus deteriorates. 

Also, as a conventional multiplexer multiplexes data based 
on the number of samples included in a packet, 
randomly-accessible samples, that is, video samples 
corresponding to intra frames are respectively stored in a different 

10 part of the packet,, packet by packet in most cases. Therefore, 
there is a problem that the calculation amount needed for 
searching samples becomes huge because the MP4 file playback 
apparatus must search all the video samples included in a packet 
when searching randomly-accessible samples. 

15 These problems will be explained in detail with reference to 

FIG. 11. 

FIG. 11 is a diagram for explaining problems of a 
conventional multiplexer. 

FIG. 11A illuminates the first problem that the efficiency of 
20 the data access deteriorates during the playback. 

The header information items of samples included in 
respective mdat are stored in each moof immediately before each 
mdat, the header information item concerning the video sample of 
playback start time 20s stored in mdat_l is stored in moof_l as the 
25 leading sample and the header information item concerning the 
audio sample of the playback time 20s stored in mdat_10 is stored 
in moof_10 as the last sample. 

Therefore, the MP4 file playback apparatus must search data 
up to moof_10 during the time period of obtaining the header 
30 information items of video samples stored in moof_l to obtaining 
obtain the header information items of audio samples when trying 
to play back the part of 20 seconds in the playback time of a 
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content, which makes the efficiency of the data access deteriorate. 

FIG. 11B illuminates the second problem that the calculation 
amount needed for searching randomly-accessible samples 
becomes huge. 

5 The header information item concerning the "i"th 

randomly-accessible video sample stored in the last part of the 
mdat_l is stored as the last sample in moof_l, and the header 
information concerning the "i+l"th randomly-accessible video 
sample that is stored in the last part of the mdat_3 is stored as the 
10 last sample in moof_3. 

Therefore, the MP4 file playback apparatus must search up 
to the last sample of moof when trying to perform random access, 
and thus the calculation amount necessary for searching becomes 
huge. 

15 Further, in addition to the first and the second problems, as 

the number of seeks for obtaining the sample data becomes many 
under the structure of the extension part of the MP4 file that is 
generated in the conventional multiplexer, there is another 
problem that this is not appropriate for the random access 

20 playback in an apparatus which has a slow seek speed such as an 
optical disc playback apparatus. 

This problem will be explained with reference to FIG. 11B 
again. In the case of trying to perform random access to the "i"th 
randomly-accessible video sample of moof_l, the playback 

25 apparatus moves a reading pointer to the leading point of moof_l 
in order to obtain the header information item of the "i"th 
randomly-accessible video sample first and then analyzes data in 
moof_l in sequence. At this time the first seek becomes 
necessary. 

30 After that, the playback apparatus obtains the information 

as to which part of mdat_l stores the entity data of the "i"th 
randomly-accessible video sample and moves the reading pointer 
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to the starting position of the entity data. At that time, as the 
entity data of the Vth randomly-accessible video sample is stored 
in the end of mdat__l, it is impossible to obtain the entity data of a 
sample by moving the reading pointer in sequence from the leading 
5 position of moof_l, and thus the second seek becomes necessary. 

In other words, as respective seek operations are performed 
at the time of moving the reading pointer to the leading location of 
moof_l and to the starting position of the entity data, it takes a lot 
of time to perform random access playback in the case where the 

10 playback apparatus has a slow seek speed. Especially, in the case 
where the entity data item of an audio sample or the like that is 
synchronized with the "i"th randomly-accessible video sample is 
stored in a place such as a different packet away from the entity 
data of the video sample, additional seek operation becomes 

15 necessary and it is impossible to perform an immediate random 
access playback. 

The present invention is conceived considering these 
problems, and an object of the present invention is to provide a 
multiplexer which has a high efficiency of data access at the time of 

20 playing back a multiplexed media data file and which can multiplex 
media data so that the calculation amount needed for searching 
samples can be reduced. 

Also, another object is to provide a multiplexer which can 
multiplex media data so that an apparatus with a slow seek speed 

25 can perform random access playback of a multiplexed file. 

Further, another object is to obtain the file multiplexed by 
the multiplexer and provide a demultiplexer which can dmu l t i p l cx 
demultiplex the multiplexed file. 



30 

-Brief Summary of the Invention 

In order to achieve the above-mentioned object, the 
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multiplexer in the present invention generates multiplexed data by 
multiplexing packets of media data including image data and at 
least one of audio data and text data, comprising: a media data 
obtainment unit operable to obtain the media data; an analysis 
5 unit operable to analyze the media data obtained by the media data 
obtainment unit and obtain playback start time information that 
indicates a playback start time of a sample that is a smallest access 
unit of the image data, audio data and text data included in the 
media data; a packetization part determination unit operable to 

10 determine, based on the playback start time information obtained 
by the analysis unit, a packetization part of the media data in a way 
that playback start times of respective samples of the image data, 
audio data and text data that are included in the media data are 
made to be the same or approximately the same; a packet header 

15 part generation unit operable to generate a packet header part that 
holds a header of the media data on a basis of the packetization 
part determined by the packetization part determination unit; a 
packet data part generation unit operable to generate a packet 
data part that holds entity data of the media data on a basis of the 

20 packetization part determined by the packetization part 
determination unit; and a packetization unit operable to generate 
a packet by connecting the packet header part generated by the 
packet header part generation unit with the packet data part 
generated by the packet data part generation unit. 

25 In this way, playback start times of image data, audio data 

and text data that are included in the media data become the same 
or approximately the same and stored in the packet, which makes 
it possible to improve the data access efficiency of the playback 
apparatus in playback. 

30 Also, in the multiplexer in the present invention, the image 

data is video data, and the analysis unit further analyzes the video 
data obtained by the media data obtainment unit and obtains intra 
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frame information in the case where the video data includes at 
least one sample including the intra frame information indicating 
that the sample is an intra coded sample, the packetization part 
determination unit determines the media data as the packetization 
5 part based on the intra frame information and the playback start 
time information in the case where the analysis unit obtains the 
intra frame information and preferably place the sample of the 
video data including the intra frame information in the leading part 
of the pakctization packetization part. 

10 In this way, as the leading video sample included in a packet 

becomes the video sample of an intra frame, it is possible to widely 
reduce the calculation amount needed for searching samples when 
the playback apparatus performs random access. 

Further, in the multiplexer in the present invention, the 

15 packet data part generation unit preferably generates the packet 
data part for storing samples of the media data items included in 
the packetization part by interleaving in a way that the playback 
start times of the samples are in an ascending order. 

In this way, as the playback start times of the video samples 

20 and the audio samples are stored in mdat in an ascending order, it 
is possible to reduce the number of seek operations when the 
playback apparatus performs random access, which enables a 
playback apparatus with a slow seek speed can realize an 
immediate random access playback. 

25 Note that the present invention can be realized not only as a 

multiplexer like this but also as a multiplexing method regarding 
these characteristic units of the multiplexer like this as steps or as 
a program that causes a computer to execute these steps. After 
that, the program like this can be distributed via a recording 

30 medium such as a CD-ROM or a communication medium such as 
the Internet. 
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Brief Description of the Drawings 

FIG. 1 is a diagram for explaining the structures of boxes 
that constitute a conventional MP4 file; 
5 FIG. 2 is a diagram for explaining the basic part of the 

conventional MP4 file; 

FIG. 3A is a diagram for explaining the structure of a movie 
box in the conventional MP4 file; 

FIG. 3B is a tree-shaped diagram showing the structure of 
10 the movie box in the conventional MP4 file; 

FIG. 4 is a diagram showing the structure of the MP4 file 
including the conventional extension part; 

FIG. 5 is a diagram for explaining the structure of the 
conventional movie fragment box; 
15 FIG. 6 is a diagram for explaining the structure of a 

conventional track fragment run box; 

FIG. 7A is a diagram showing the first structural example of 
the MP4 file including the conventional extension part; 

FIG. 7B is a diagram showing the second structural example 
20 of the MP4 file including the conventional extension part; 

FIG. 8 is a block diagram showing the structure of the 
conventional multiplexer; 

FIG. 9 is a flow chart showing the processing operation of a 
conventional packet unit determination unit; 
25 FIG. 10 is a diagram showing an example of a packet 

generation table that stores a header information item of a 
conventional video sample; 

FIG. 11A is a diagram for explaining the first problem of the 
conventional multiplexer; 
30 FIG. 11B is a diagram for explaining the second problem of 

the conventional multiplexer; 

FIG. 12 is a block diagram showing the functional structure 
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of the multiplexer in the first invention of the present invention; 

FIG. 13 is a flow chart showing the processing operation of 
the multiplexer; 

FIG. 14 is a flow chart showing the processing operation of a 
5 video packetization part determination unit; 

FIG. 15 is a flow chart showing the processing operation of 
an audio packetization part determination unit; 

FIG. 16A is a diagram showing the first example of the data 
structure of the MP4 file extension part generated by the 
10 multiplexer; 

FIG. 16B is a diagram showing the second example of the 
data structure of the MP4 file extension part generated by the 
multiplexer; 

FIG. 17 is a block diagram showing the functional structure 
15 of the packetization part determination unit of the multiplexer in a 
second embodiment; 

FIG. 18 is a flow chart showing the first processing operation 
of the video packetization part determination unit; 

FIG. 19 is a flow chart showing the second processing 
20 operation of the video packetization part determination unit; 

FIG. 20A is a diagram showing the first example of the data 
structure of the MP4 file extension unit generated by the 
multiplexer; 

FIG. 20B is a diagram showing the second example of the 
25 data structure of the MP4 file extension unit generated by the 
multiplexer; 

FIG. 21 is a block diagram showing the functional structure 
of the packet data generation unit of the multiplexer in a third 
embodiment; 

30 FIG. 22 is a flow chart showing the processing operation of 

the packet data generation unit; 

FIG. 23 is a diagram showing the outline of the data 
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structure of the MP4 file extension part generated by the 
multiplexer; 

FIG. 24 is a diagram showing the first example of the data 
structure of the MP4 file extension unit generated by the 
5 multiplexer; 

FIG. 25 is a diagram showing the second example of the data 
structure of the MP4 file extension unit generated by the 
multiplexer; 

FIG. 26 is a block diagram showing the functional structure 
10 of a demultiplexer in a fourth embodiment; 

FIG. 27 is a flow chart showing the processing operation of 
the demultiplexer; and 

FIG. 28 is a diagram showing an application of the 
multiplexer in the present invention. 

15 

Best Mode fo r Ca rr y i ng Out Detailed Description of the 
Invention 

Embodiments in the present invention will be explained with 
reference to figures below. 

20 Note that MPEG-4 Visual coded data is used as the video 

data in tfris -the first embodiment and the MPEG-4 Audio coded data 
is used as the audio data in th+s -the first embodiment. After that, 
this — the first embodiment mainly explains the apparatus that 
multiplexes video data and audio data, but there is no intention of 

25 eliminating multiplexing other media data such as text data. 

(First Embodiment) 

First, the multiplexer in the first embodiment of the present 
invention will be explained with reference to FIG. 12 to FIG. 16. 
30 FIG. 12 is a block diagram showing the functional structure 

of the multiplexer in the first embodiment of the present invention. 

This multiplexer 100 is an apparatus that generates the MP4 
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file that contains an extension part by multiplexing video data or 
audio data, and includes the first input unit 101, the first data 
storage unit 102, the first analysis unit 103, the second input unit 
104, the second data storage unit 105, the second data analysis 
5 unit 106, a packetization part determination unit 107, a packet 
generation table storage unit 111, a packet header generation unit 
112, a packet data generation unit 113 and a packet connection 
unit 114. 

The first input unit 101 is an interface that captures the 
10 coded video data in the image coding apparatus or the like and puts 
it into the multiplexer 100, and has the first data storage unit 102 
to store the obtained video input data items in sequence. 

The first data storage unit 102 is a cache memory, a random 
access memory (RAM) or the like that temporally stores the video 
15 input data. 

The first analysis unit 103 is a processing unit that reads out 
the video sample data that is the data of a single video sample 
among video input data items stored in the first data storage unit 

102 and analyzes these video input data items and outputs the 
20 header information of the video sample, and the first analysis unit 

103 is realized in a form of a CPU or a memory. Note that the 
header information of the video sample outputted in this first 
analysis unit 103 includes the size of a video sample, the playback 
duration and the information indicating whether it is an intra frame 

25 or not. Further, the header information of this video sample 
includes the differential information between the decoding time 
and the display time in the case where it is the sample using inter 
prediction. 

The second input unit 104 is an interface that captures the 
30 coded audio data in the audio coding apparatus or the like and puts 
the coded audio data into the multiplexer 100, and it has the 
second data storage unit 105 to store the obtained audio input data 
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items in sequence. 

The second data storage unit 105 is a cache memory, RAM or 
the like that temporally stores the audio input data. 

The second analysis unit 106 is a processing unit that reads 
5 out the audio sample data that is the data of a single audio sample 
among audio input data items stored in the second data storage 
unit 105 and analyzes these audio input data items and outputs the 
header information of the audio sample, and the second analysis 
unit 106 is realized in a form of a CPU or a memory. Note that the 

10 header information of the audio sample outputted in this second 
analysis unit 106 includes the size of an audio sample and the 
information indicating the playback duration. 

The packetization part determination unit 107 is a 
processing unit that determines the packetization part of the video 

15 data and the audio data in a way that the playback start time of the 
video sample included in a packet becomes the same or 
approximately the same as the playback start time of the audio 
sample by integrating video sample included in the packet and the 
header information of the audio sample, and is realized in a form of 

20 a CPU or a memory. Also, the packetization part determination 
unit 107 outputs the collection of sample header information items 
for a determined packetization part to the packet generation table 
storage unit 111 as a packet generation table, and outputs, to the 
packet header generation unit 112, a packet generation signal that 

25 instructs generating packet headers after the packetization part is 
determined. After that, this packetization part determination unit 
107 includes a time adjustment unit 108 that adjusts packetization 
parts by the total duration of samples in a packet, a video 
packetization part determination unit 109 that determines 

30 packetization packets of video data and an audio packetization part 
determination unit 110 that determines packetization parts of 
audio data. 
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The time adjustment unit 108 is a processing unit that 
adjusts the end time of a packet so that the packet finishes within 
a predetermined time unit. This time adjustment unit 108 outputs 
a predetermined time (target time) to the video packetization part 
5 determination unit 109 first. Note that a user may specify this 
target time. In this case, the multiplexer 100 obtains a 
specification of the target time via the input apparatus such as 
keyboards, and outputs a target time input signal showing the 
target time specified via the input apparatus to the time 

10 adjustment unit 108. 

The video packetization part determination unit 109 is a 
processing unit that obtains the video sample header information 
from the first analysis unit 103 and determines the packetization 
part of the video data. 

15 This video packetization part determination unit 109 obtains 

a target time from the time adjustment unit 108, a-video sample 
header information from the first analysis unit 103, and adds 
header information items up to the header information item of the 
last video sample included in a packet counting the playback 

20 duration of each video sample included in each of the video sample 
header information items in a way that the video data finishes in a 
packet within the target time. The video packetization part 
determination unit 109 adds the header information item of the 
last video sample included in the packet and outputs the video 

25 sample playback time information showing the total of the 
playback start time of the first video sample included in the packet 
and the playback duration of the video sample included in the 
packet to the audio packetization part determination unit 110. 

The audio packetization part determination unit 110 is a 

30 processing unit that obtains the audio sample header information 
obtained from the second analysis unit 106 and determines the 
packetization part of the audio data. 
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This audio packetization part determination unit 110 obtains 
the video sample playback time information from the video 
packetization part determination unit 109 and the audio sample 
header information from the second analysis unit 106, places, at 
5 the leading part of the packet, the audio sample of the playback 
start time that is the same or approximately the same as the 
playback start time of the leading video sample included in the 
packet, and places the last audio sample included in the packet so 
that the total of the playback durations of the audio samples 

10 included in the packet becomes the same or approximately the 
same as the total of the playback durations of the video samples 
included in the packet counting the playback duration of each audio 
sample included in each audio sample header information item. 

Note that an audio sample of the playback start time that is 

15 the same or approximately the same as the playback start time of 
the video sample here is the audio sample of the earliest playback 
start time after the playback start time of the video data A or the 
audio sample of the last playback start time before the playback 
start time of the video sample. 

20 After that, the audio packetization part determination unit 

110 adds header information items of the audio samples from the 
leading audio sample to the last audio sample included in the 
packets to the audio packet generation table in sequence. 

The packet generation table storage unit 111 is a cache 

25 memory, a RAM or the like that temporally stores a video packet 
generation table and an audio packet generation table that are 
outputted from the packetization part determination unit 107. 

The packet header generation unit 112 is the processing unit 
that generates a packet header part (moof) that stores a header 

30 information item of a packet, and is realized as a CPU or a memory. 

This packet header generation unit 112 obtains a packet 
generation signal from the packetization part determination unit 
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107, reading out packet sample header information from the 
packet generation table storage unit 111 referring to the packet 
generation table so as to generate moof data and outputs it to the 
packet connection unit 114. 
5 Also, the packet header generation unit 112 outputs, to the 

packet data generation unit 113, pointer information showing at 
which part of the first data storage unit 102 and the second data 
storage unit 105 the entity data items of the video samples 
included in a packet and the audio sample are stored, sample size 
10 information showing the size of the sample, the mdat information 
including a signal that instructs generating a packet data unit 
(mdat). 

Note that, this packet header generation unit 112 can store 
header information items of media data coded using a coding 

15 method such as advanced multi rate CODEC (AMR) where the 
coded rate is switched in the middle of the data in a different traf 
depending on a coded rate at the time of generating moof. 

The packet data generation unit 113 is the processing unit 
that generates a packet data part (mdat) where the entity data of 

20 a packet is stored and realized as a CPU or a memory. 

This packet data generation unit 113 obtains the mdat 
information from the packet header generation unit 112, reads out 
the video entity data of the video sample included in a packet from 
the first data storage unit 102 based on the pointer information 

25 included in mdat information and the sample size information, 
reads out the audio entity data of the audio sample included in a 
packet from the second data storage unit 105 so as to generate 
mdat data and outputs the packet connection unit 114. 

The packet connection unit 114 is the processing unit that 

30 connects moof data with mdat data and generates mp4 extension 
data for a single packet and realized as a CPU or a memory. This 
packet connection unit 114 obtains moof data from the packet 
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header generation unit 112, obtains mdat data from the packet 
data generation unit 113, generates the mp4 extension data for a 
single packet by connecting moof data with mdat data, and outputs 
mp4 extension unit data items that are generated in sequence to 
5 the apparatus that generates the MP4 file. 

The processing procedure of generating an extension unit of 
an MP4 file in the multiplexer 100 constituted like this will be 
explained with reference to FIG. 13. 

FIG. 13 is a flow chart showing the processing operation of 
10 the multiplexer 100. 

First, the first input unit 101 and the second input unit 104 
read video data and audio data in the multiplexer 100 (S100), the 
first input unit 101 causes the first data storage unit 102 to store 
the video input data, and the second input unit 104 causes the 
15 second data storage unit 105 store the audio input data. 

Next, the first analysis unit 103 reads out the video sample 
data from the first data storage unit 102 so as to analyze it and 
outputs the video sample header information to the video 
packetization part determination unit 109 of the packetization part 
20 determination unit 107. After that, the video packetization part 
determination unit 109 determines the packetization part of the 
video data based on the video sample header information obtained 
from the first analysis unit 103 and the target time obtained from 
the time adjustment unit 108 (S110). Note that the processing 
25 operation of determining the packetization part of the video data 
by the video packetization part determination unit 109 will be 
explained in detail later. 

After that, the video packetization part determination unit 
109 outputs the playback time information of the video sample 
30 included in the packet whose packetization part is determined to 
the audio packetization part determination unit 110 (S120). 

After that, the audio packetization part determination unit 
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110 determines the packetization part of the audio data based on 
the playback duration information of the video sample obtained 
from the video packetization part determination unit 109 (S130). 
At this time, the audio packetization part determination unit 110 
5 determines the packetization part so that the playback start time 
of the leading audio sample included in the packet becomes the 
same or approximately the same as the playback start time of the 
leading video sample included in the packet. 

When the audio packetization part determination unit 110 

10 determines the packetization part of the audio data, the 
packetization part determination unit 107 outputs a packet 
generation table to the packet generation table storage unit 111 
and outputs a packet generation signal to the packet header 
generation unit 112. 

15 After that, the packet header generation unit 112 generates 

moof data on a basis of the determined part so as to output it to the 
packet connection unit 114. The packet data generation unit 113 
generates mdat data on a basis of the determined part so as to 
output it to the packet connection unit 114. The packet 

20 connection unit 114 connects moof data with mdat data so as to 
generate a single packet on a basis of the determined part (S140) 
and outputs it as mp4 extension data for a single packet. 

After generating the packet, the multiplexer 100 judges 
whether data to be inputted is left in the first input unit 101 and the 

25 second input unit 104 (S150). Here, in the case where there is 
input data (No in S150), the multiplexer 100 clears the data that 
has already been packetized among data stored in the buffer 
memory, that is the first data storage unit 102, the second data 
storage unit 105 and the packet generation table storage unit 111 

30 (S160) and repeats the processing operation from the 
above-mentioned S110 to S150. 

On the other hand, in the case where there is no input data 
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(Yes in S150), the multiplexer 100 finishes the generation 
processing of the extension part of the MP4 file. 

In this way, the multiplexer 100 determines the 
packetization part of the audio data after determining the 
5 packetization part of the video data first and generates the 
extension part of the MP4 file by multiplexing the media data. 

Here, in the step S110 in FIG. 13, the processing operation 
of determining the packetization part of the video data by the video 
packetization part determination unit 109 will be explained in 
10 detail. 

FIG. 14 is a flow chart showing the processing operation of 
the video packetization part determination unit 109. 

The video packetization part determination unit 109 obtains 
the target time from the time adjustment unit 108 prior to this 
15 flow. 

After that, the video packetization part determination unit 
109 obtains the video sample header information from the first 
analysis unit 103 (Sill) and adds the video sample header 
information to the video packet generation table (S112). 

20 At this time, the video packetization part determination unit 

109 judges whether the total of the playback durations of the video 
samples included in the video sample header information items, 
that is, the total playback durations of the video data included in 
the packet becomes the previously obtained target time or exceeds 

25 the target time or not (S113). 

In the case where the total playback durations of the video 
data included in the packet docs do not reach the target time (No 
in S113), the video packetization part determination unit 109 
obtains next video sample header information (Sill) and repeats 

30 the processing operations of S112 and S113. 

In the case where the total playback durations of the video 
data included in the packet reaches reach_the target time (Yes in 
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S113), the video packetization part determination unit 109 
determines the video sample indicated by the video sample header 
information that is added to the video packet generation table last 
as the last video sample included in the packet (S114) and finishes 
5 the processing operation of determining a packetization part. 

Next, in the step S130 in FIG. 13, the processing operation 
of determining the packetization part of the audio data by the audio 
packetization part determination unit 110 will be explained in 
detail. 

10 FIG. 15 is a flow chart showing the operation processing of 

the audio packetization part determination unit 110. 

The audio packetization part determination unit 110 obtains 
the video sample playback information from the video 
packetization part determination unit 109 prior to this flow. 

15 After that, the audio packetization part determination unit 

110 obtains the audio sample header information from the second 
analysis unit 106 (S131), refers to the video sample playback 
duration information that is previously obtained (S132), reads out 
the playback start time of the leading video sample included in the 

20 packet and determines the audio sample of the playback start time 
that is the same or approximately the same as the playback start 
time of the leading video sample included in the packet as the 
audio leading sample of the packet (S133). 

The audio packetization part determination unit 110 

25 determines the audio leading sample included in the packet, 
obtains the audio sample header information items in sequence 
(S134) and adds the audio sample header information items to the 
audio packet generation table (S135). 

After that, the audio packetization part determination unit 

30 110 reads out the total of the playback durations of the video 
samples included in the packet by referring to the video sample 
playback duration information (S136), determines the last audio 
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sample included in the packet so that the total of the playback 
durations of the audio samples included in the packet becomes the 
same or approximately the same as the total of the playback 
durations of the video samples included in the packet (S137) and 
5 finishes the processing operation of determining the packetization 
part. 

The extension part of the MP4 file to be generated through 
the processing operation by the multiplexer 100 like this has 
excellent data access efficiency at a playback apparatus. The 
10 reason will be explained with reference to a data structure example 
of the MP4 file extension part to be generated by the multiplexer 
100 in FIG. 16. 

The MP4 file extension part 200 shown in FIG. 16A is made of 
a plurality of packets and connected to the basic part of the MP4 
15 file. 

Each of the packets that constitute the MP4 file extension 
part 200 is made of moof of the packet header part and mdat of the 
packet data part. Here, the packet_l means that it is the first 
packet of the MP4 file extension part 200, moof included in packet 

20 _1 is shown as moof_l, and mdat included in packet_l is shown as 
mdat_l. Also, "V" shown in each mdat of FIG. 16A is for indicating 
a video sample, while "A" shown in each mdat of FIG. 16A is for 
indicating an audio sample (the same is true of in other figures). 
The video sample whose playback start time is 20 seconds is 

25 stored in mdat_l of the MP4 file extension part 200 as a leading 
video sample, and also the audio sample whose playback start time 
is 20 minutes is stored in mdat_l of the MP4 file extension part 200 
as an— a leading audio sample. Also, a video sample whose 
playback start time is 30 minutes is stored in mdat_2 as a leading 

30 video sample, and also an audio sample whose playback start time 
is 30 minutes is stored in mdat_2 as a leading audio sample. 

In this way, storing a video sample and an audio sample in a 
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single packet in a way that their playback start times are made to 
be the same or approximately the same as each other makes it 
possible to widely reduce the calculation amount needed for data 
access at the time of playing back the MP4 file extension part 200 
5 at the playback apparatus side. 

Also, as playback start times of each media data are stored 
in a packet after they are made to be the same or approximately 
the same as each other, it is possible to adjust the size of the MP4 
file data to a desired size. 
10 Here, the MP4 file extension part generated by the 

multiplexer 100 may be the data structure shown in FIG. 16B. 

FIG. 16B is a diagram showing the second example of the 
data structure of the MP4 file extension part generated by the 
multiplexer 100. 

15 A video sample whose playback start time is 20 minutes is 

stored in the mdat_l of the MP4 file extension part 210 shown in 
FIG. 16B as a leading video sample, and an audio sample whose 
playback start time is 20 minutes is stored in mdat_2 as a leading 
audio sample. Also, a video sample whose playback start time is 

20 30 seconds minutes is stored in mdat_3 as a leading video sample, 
and an audio sample whose playback start time is 30 minutes is 
stored in mdat_4 as a leading audio sample. 

In this way, storing one of a video data or an audio data in a 
single packet and alternately arranging a packet storing video data 

25 items and a packet storing audio data items whose playback times 
are made to be the same or approximately the same as each other 
can widely reduce the calculation amount needed for data access at 
the time of playing back the MP4 file extension part 200 at the 
playback apparatus. 

30 As explained up to this point, the multiplexer 100 in this first 

embodiment can improve the efficiency of data access at the 
playback apparatus side because respective media data items are 
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packetized after their playback start times are made to be the 
same or approximately the same as each other. 

(Second Embodiment) 
5 Next, the multiplexer in this second embodiment of the 

present invention will be explained with reference to FIG. 17 to FIG. 
20. 

The multiplexer in the second embodiment has the same 
main units as the multiplexer 100 in the above-mentioned first 

10 embodiment, but it differs from the multiplexer 100 in the 
above-mentioned first embodiment in that it has a unique unit in a 
packetization part determination unit. This different point will be 
focused on in the following explanation. Note that the same codes 
are used for the same units as in the above-mentioned first 

15 embodiment and their explanation will be omitted. 

FIG. 17 is a block diagram showing the functional structure 
of the packetization part determination unit of a multiplexer in the 
second embodiment. 

This packetization part determination unit 117 is the 

20 processing unit that integrates the video sample included in a 
packet and header information of an audio sample and determines 
a packetization part of the video data and the audio data in a way 
that playback start times are made to be the same or 
approximately the same as each other and the leading video 

25 sample included in a packet becomes an intra frame, and includes 
a time adjustment unit 108, a video packetization part 
determination unit 119 and an audio packetization part 
determination unit 110. 

The video packetization part determination unit 119 is the 

30 processing unit that obtains video sample header information from 
the first analysis unit 103 and determines a packetization part of 
video data based on either time or an intra frame, includes a 
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time-based part adjustment unit 120 and an I frame-based part 
adjustment unit 121. 

The time-based part adjustment unit 120 is the processing 
unit that adjusts a packetization part of video data based on target 
5 time outputted from the time adjustment unit 108 and adjusts a 
packetization part in a way that a packet becomes a predetermined 
time unit by counting playback durations of respective video 
sample headers. 

The I frame-based part adjustment unit 121 is the 
10 processing unit that adjusts a packetization part of video data 
based on whether the information indicating an intra frame is 
included in the video sample header information outputted from 
the first analysis unit 103. The I frame-based part adjustment 
unit 121 obtains the video sample header information that includes 
15 the information indicating an intra frame, switches packetization 
parts at a video sample of an intra frame, and adjusts the 
packetization part in a way that the leading video sample of_a next 
packet becomes the video sample of an intra frame. 

The processing operation that determines a packetization 
20 part of video data by the video packetization part determination 
unit 119 of the multiplexer in the second embodiment that includes 
a packetization part determination unit 117 constituted like this 
will be explained in detail. 

FIG. 18 is a flow chart showing the processing operation of 
25 the video packetization part determination unit 119. 

The video packetization part determination unit 119 obtains 
a_target time from the time adjustment unit 108 and stores the 
time-based part adjustment unit 120 prior to this flow. 

After that, l ikew i se as in t he above-mentioned first 
30 embodiment, the video packetization part determination unit 119 
obtains the video sample header information from the first analysis 
unit 103 (S201) and adds the video sample header information to 
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the video packet generation table (S202). 

At this time, the video packetization part determination unit 
119 judges whether the information indicating an intra frame is 
included in the obtained video sample header information in the I 
5 frame-based part adjustment unit 121 (S203). 

In the case where the information indicating an intra frame 
is included (Yes in S203), the video packetization part 
determination unit 119 judges whether the total playback 
durations of all the video samples included in a packet exceeds the 

10 previously obtained target time in the time-based part adjustment 
unit 120 (S205). 

Here, in the case where no information indicating an intra 
frame is included (No in S203) or in the case where the total 
durations do not exceed the target time (No in S205), the video 

15 packetization part determination unit 119 updates the total of the 
playback durations of video samples included in the packet by 
adding the playback duration of the video sample included in the 
video sample header information in the time-based part 
adjustment unit 120 (S204), obtains next video sample header 

20 information (S201) and repeats the above-mentioned processing 
operation. 

On the other hand, in the case where the total duration 
exceeds the target time (Yes in S205), the video packetization part 
determination unit 119 determines the video sample immediately 

25 before the video sample judged as an intra frame in the I 
frame-based part adjustment unit 121 as the last video sample 
included in the packet (S206) and finishes the processing operation 
of determining a packetization part of video data. 

In the extension part of the MP4 file generated through the 

30 processing operation of the video packetization part determination 
unit 119 like this, playback can be started from the leading video 
sample of a packet at the time of random access at a playback 
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apparatus side because the video sample stored in the leading part 
of the packet surely becomes a video sample of an intra frame, and 
thus it is possible to widely reduce the calculation amount needed 
for searching a randomly-accessible video sample. 
5 Also, as the video sample stored in the leading part of the 

packet surely becomes the video sample of an intra frame, only the 
information indicating indicated as randomly accessible must be 
described only in the leading sample flag field of trun that is 
located in the leading part of traf that holds header information of 
10 a video track in the packet header part (moof) and respective 
sample flag fields of the respective trun can be omitted by using 
default values, and thus the workload at the time of generating 
moof data is reduced and the size of the whole MP4 file can also be 
reduced. 

15 Note that the playback duration per a single packet may be 

long when the space between intra frames included in the video 
data becomes wide in this processing operation. Therefore, the 
packetization part determination unit 117 may be the processing 
operation like described below. 

20 FIG. 19 is a flow chart showing the second processing 

operation of the video packetization part determination unit 119. 

L i kewise — As in the above-mentioned first processing 
operation, the video packetization part determination unit 119 
obtains target time from the time adjustment unit 108 and stores 

25 it in the time-based part adjustment unit 120 prior to this flow. 

After that, the video packetization part determination unit 
119 obtains the video sample header information from the first 
analysis unit 103 (S211) and adds the video sample header 
information to the video packet generation table (S212). 

30 At this time, the video packetization part determination unit 

119 judges whether the total playback time of all the samples 
included in the packet exceeds the target time that is previously 
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obtained in the time-based part adjustment unit 120 (S213). 

In the case where the total time exceeds the target time (Yes 
in S213), the video packetization part determination unit 119 
determines the video sample indicated by the video sample header 
5 information that is immediately before the video sample header 
information obtained this time as the last video sample included in 
the packet (S214) and finishes the processing operation of 
determining the packetization part of the video data. 

On the other hand, in the case where the total time does not 
10 exceed the target time (No in S213), the video packetization part 
determination unit 119 judges whether the information indicating 
an intra frame is included in the obtained video sample header 
information in the I frame-based part adjustment unit 121 or not 
(S215). 

15 Here, in the case where the information indicating an intra 

frame is included (Yes in S215), the video packetization part 
determination unit 119 determines, as the last video sample 
included in the packet, the video sample that is immediately before 
the video sample that is judged as an intra frame in the I 

20 frame-based part adjustment unit 121 (S214) and finishes the 
processing operation of determining the packetization part of video 
data. 

On the other hand, in the case where no information 
indicating as-an intra frame is included (No in S215), the video 

25 packetization part determination unit 119 updates the total of 
playback durations of video sample included in the packet by 
adding playback durations of video samples included in the video 
sample header information in the time-based part adjustment unit 
120 (S216), obtains next video sample header information (S211) 

30 and repeats the above-mentioned processing operation. 

The extension part of the MP4 file generated through the 
second processing operation of the video packetization part 
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determination unit 119 like this can generate packets setting a 
predetermined time limit so as to keep the packet size within the 
desired size and, in the case where video samples of intra frames 
are included, store them in the leading part of the packets, which 
5 only requires judging whether only the leading video sample of the 
packet is the randomly-accessible video sample or not at the time 
of random access at a playback apparatus side , and thus . Thus. 
it becomes possible to reduce the calculation amount needed for 
searching randomly-accessible video samples. 

10 Note that, like in the case of the above-mentioned first 

embodiment, the video packetization part determination unit 119 
finishes the processing operation of determining a packetization 
part of video data, outputs the video sample playback time 
information to the audio packetization part determination unit 110 

15 and the processing operation of determining a packetization part of 
audio data is performed in the audio packetization part 110. 

The extension part of the MP4 file generated through the 
processing operation by the packetization part determination unit 
117 like this reduces the searching workload at the time of random 

20 access in a playback apparatus. The reason will be explained with 
reference to a data structure example of the MP4 file extension 
part generated by the multiplexer in the second embodiment in FIG. 
20. 

In the mdat_l of the MP4 file extension unit 220 shown in 
25 FIG. 20A, the video sample of an intra frame is stored as a leading 
video sample, and a video sample of an intra frame is stored in a4se 
mdat_2 as a leading video sample. 

In this way, storing a video sample of an intra frame in the 
packet as a leading video sample makes it suff i ce sufficient to 
30 search only the leading video sample in the packet in order to 
obtain a randomly-accessible video sample at the time of random 
access at the playback apparatus side, which eliminates the 



-38- 



necessity of searching all the video samples included in the packet 7 
and thus . Thus, it is possible to widely reduce the workload in 
searching samples at the time of random access. 

Also, at this time, describing the information indicating 
5 indicated as randomly accessible in only the leading sample flag 
field of trun located in the leading part of traf that stores header 
information of the video track in moof_l and moof_2 of the MP4 
file extension part 220 makes it possible to reduce the size of 
moof_l and moof_2. 

10 Here, the extension part of the MP4 file generated by the 

multiplexer in the second embodiment may be the data structure 
shown in FIG. 20B. 

The video sample of an intra frame is stored in mdat_l of the 
MP4 file extension part 230 shown in FIG. 20B as a leading video 

15 sample, and a video sample of an intra frame is stored also in 
mdat_3 as a leading video sample. Also, audio samples are stored 
in mdat_2 and mdat_4. 

In this way, storing one of video data and audio data in a 
single packet and storing a video sample of an intra frame in the 

20 packet that stores the video data as a leading video sample makes 
it possible to widely reduce the workload in searching samples at 
the time of random access at the playback apparatus side. 

Note that, in any of the data structure examples of these 
MP4 file extension paffc parts , making the playback start time of the 

25 leading video sample stored in the packet the same or 
approximately the same as the playback start time of the leading 
audio sample makes it possible to widely reduce the calculation 
amount needed for data access at the playback apparatus. 

As explained up to this point, with the multiplexer in this 

30 second embodiment, it is possible to reduce the calculation amount 
needed for searching samples at the time of random access at the 
playback apparatus because it generates packets in a way that a 
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randomly-accessible video sample is made to be the leading video 
sample. 

(Third Embodiment) 
5 Further, the multiplexer in the-a_third embodiment of the 

present invention will be explained with reference to FIG. 21 to FIG. 
25. 

The multiplexer in the third embodiment has the same main 
units as the multiplexers in the above-mentioned first and second 

10 embodiments, but it differs from the multiplexers in the 
above-mentioned first and second embodiments in that it has a 
unique unit in the packet data generation unit. This different 
point will be focused on in the following explanation. Note that 
the same codes are used for the same units as the 

15 above-mentioned first and second embodiments and explanations 
on them will be omitted. 

FIG. 21 is a block diagram showing the functional structure 
of the packet data generation unit of the multiplexer in the third 
embodiment. 

20 This packet data generation unit 130 is the processing unit 

that generates a packet data unit (mdat) by interleaving and 
storing the entity data of the video sample and the entity data of 
the audio sample, and includes a-an mdat information obtainment 
unit 131, a video entity data reading out unit 132, an audio entity 

25 data reading out unit 133 and an interleave arrangement unit 134. 

The mdat information obtainment unit 131 is the processing 
unit that obtains the mdat information from the packet header 
generation unit 112 and outputs the read instruction of the entity 
data or the playback time information to other units that 

30 constitutes constitute the packet data generation unit 130. 

This T he mdat information obtainment unit 131 obtains 
mdat information from the packet header generation unit 112, 
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analyzes the mdat information, obtains the playback time 
information indicating the playback start times and the playback 
end times of the video samples and audio samples, and rearranges 
them based on this playback time information in a way that 
5 playback start times of all video samples and audio samples 
included in the packet are in an ascending order. 

After that, the mdat information obtainment unit 131 
outputs, to the video entity data reading unit 132, the video read 
instruction that instructs reading out the entity data of the video 

10 sample, or it outputs, to the audio entity data reading unit 133, 
audio read instruction that instructs reading out the entity data of 
audio sample starting from the sample whose playback starting 
time is earliest according to the rearranged order. This T he video 
read instruction includes pointer information indicating in which 

15 part of the first data storage unit 102 the entity data of the video 
sample is stored and the size information of the video sample , and 
the -. The audio read instruction includes pointer information 
indicating which part of the second data storage unit 105 the entity 
data of the audio sample is stored and the size information of the 

20 audio sample. 

The video entity data reading unit 132 is the processing unit 
that obtains the video read instruction from mdat information 
obtainment unit 131 and reads out the video entity data from the 
first data storage unit 102. The video entity data reading unit 132 

25 reads out the video entity data from the first data storage unit 102 
with reference to the pointer information included in the video read 
instruction and the size information and outputs the read video 
entity data to the interleave arrangement unit 134. 

The audio entity data reading unit 133 is the processing unit 

30 that obtains the audio read instruction from mdat information 
obtainment unit 131 and reads out the audio entity data from the 
second data storage unit 105. This audio entity data reading unit 
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133 reads out audio entity data from the second data storage unit 
105 with reference to the pointer information included in the audio 
read instruction and the size information and outputs the read 
audio entity data to the interleave arrangement unit 134. 
5 The interleave arrangement unit 134 is the processing unit 

that obtains the read video data and the read audio data that are 
outputted from the video entity data reading unit 132 and the 
audio entity data reading unit 133 in output order, generates mdat 
data by interleaving and arranging them, and outputs them to the 

10 packet connection unit 114. 

The processing operation of generating mdat by the packet 
data generation unit 130 in the multiplexer in the third 
embodiment that has the packet data generation unit 130 
constituted like this will be explained in detail. 

15 FIG. 22 is a flow chart showing the processing operation of 

the packet data generation unit 130. 

First, the packet data generation unit 130 obtains mdat 
information from the packet header generation unit 112 in mdat 
information obtainment unit 131 (S301). The mdat information 

20 obtainment unit 131 analyzes the obtained mdat information and 
extracts sample pointer information, sample size information and 
sample playback time information. After that, the mdat 
information obtainment unit 131 rearranges all the video samples 
and the audio samples included in a packet based on the extracted 

25 sample playback time information in a way that these playback 
start times are in an ascending order. Consequently, the mdat 
information obtainment unit 131 outputs the video read instruction 
including the pointer information and the size information of the 
extracted video sample to the video entity data reading unit 132 

30 starting from the sample whose playback stor i ng starting t ime is 
earliest according to the rearranged order, or outputs the audio 
read instruction including the pointer information and the size 
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information of the extracted audio sample to the audio entity data 
reading unit 133. 

The video entity data reading unit 132 obtains the video 
read instruction, reads out the video entity data from the first data 
5 storage unit 102 with reference to the pointer information and the 
size information so as to output it to the interleave arrangement 
unit 134. The audio entity data reading unit 133 obtains the audio 
read instruction, reads out the audio entity data from the second 
data storage unit 105 with reference to the pointer information and 

10 the size information so as to output them to the interleave 
arrangement unit 134 (S302). 

The interleave arrangement unit 134 receives the read 
entity data from the video entity data reading unit 132 and the 
audio entity data reading unit 133 and arranges them in the 

15 receiving order (S303). 

Here, the interleave arrangement unit 134 continue 
continues arranging the entity data items until all the entity data 
items, that is, all the video entity data items and the audio entity 
data items, stored in a single packet have been completed (No in 

20 S304 and S303). 

After that, when all the entity data items stored in a single 
packet have been arranged (Yes in S304), the interleave 
arrangement unit 134 outputs the arranged entity data to the 
packet connection unit 114 as mdat data (S305) so as to finish the 

25 processing operation of generating mdat. 

The extension unit of the MP4 file generated via the 
processing operation of the packet data generation unit 130 like 
this is suitable for the random access playback in an optical 
apparatus or the like that requires a lot of seek time. The reason 

30 will be explained indicating the outline of the data structure of the 
MP4 file extension part generated by the multiplexer in the third 
embodiment in FIG. 23. 
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The MP4 file extension unit 240 shown in FIG. 23 is made of 
a plurality of arranged packets: packet 1 that stores 4 to 8 second 
content data; packet 2 that stores 8 to 12 second content data; and 
packet 3 that stores 12 to 16 second content data. 
5 Each packet is made of moof 241 and mdat 242, and in the 

moof 241, tfhd (V) and traf (V-l, V-2) concerning a video track and 
tfhd(A) and traf (A-l, A-2) concerning an audio track are stored. 
Also, the entity data of a sample indicated by the header 
information stored in traf (V-l) and traf (A-l) is stored in mdat_l, 

10 and the entity data of a sample indicated by the header information 
stored in traf (V-2) and traf (A-2) are stored in mdata_2. In 
addition, in mdat 242, the entity data of a video sample and the 
entity data of an audio sample are stored in a way that they are 
alternately interleaved. 

15 At this time, moving the reading pointer to the leading 

position of moof_l at the time of random access processing that 
starts playback from the position of 4 second in playback time at 
a playback apparatus side, analyzing moof_l, and moving the read 
pointer in sequence makes it possible to obtain the entity data 

20 necessary for playback from mdat_l that is next to moof_l. 

In other words, with this MP4 file extension unit 240, the 
playback apparatus can realize random access playback by a single 
seek operation that moves the read pointer to the leading position 
of moof 1 . and thus . Thus, the apparatus is effective for an 

25 optical disc apparatus or the like that requires a lot of time to seek. 

H ere, in In mdat 242, the playback start time of the entity 
data of an audio sample stored immediately after the entity data of 
a video sample is made to be the same or approximately the same 
as the playback start time of the immediately-before video sample, 

30 the synchronous playback of video data and audio data are secured. 
FIG. 24 shows how the entity data is stored in mdat_l of the MP4 
file extension part 240. 
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As shown in FIG. 24, the playback start time of the video 
sample 1 stored in the leading part of mdat_l is 4000 ms, the 
playback start time of the audio sample 1 stored immediately after 
the video sample 1 is 4000 ms, and the playback start time of the 
5 video sample 1 and the audio sample 1 are made to be the same or 
approximately the same as each other. 

Generally, the sample rate of a video sample differs from the 
sample rate of an audio sample in most cases, here, the playback 
duration of a video sample is 500 ms, and the playback duration of 

10 an audio sample is 100 ms. 

Therefore, in mdat_l of the MP4 file extension part 240, 
audio samples 1 to 5 are interleaved and stored immediately after 
the video sample 1, and after them, video sample 2, audio samples 
6 to 10 and video sample 3 are stored in sequence. 

15 At this time, the playback start time of the video sample 2 is 

4500 ms, the playback start time of the audio sample 6 stored 
immediately after the video sample 2 is 4500 ms, and the playback 
start time of the video sample and the playback start time of the 
audio sample that is immediately after the video sample is made to 

20 be constantly the same or approximately the same as each other. 

Also, the sample rate of a video sample differs from the 
sample rate of an audio sample, there may be a case where the 
playback start time of the video sample is not the same or 
approximately the same as the playback start time of the audio 

25 sample that is immediately after the video sample. Even in this 
case, an audio sample whose playback time is the same or 
approximately the same as the playback start time of the video 
sample is used for the audio sample immediately after the video 
sample, the synchronous playback of the video data and the audio 

30 data are secured. 

FIG. 25 is a diagram showing the second data structure 
indicating how the entity data is stored in mdat_l of the MP4 file 
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extension part. 

As shown in FIG. 25, the playback start time of the video 
sample 1 stored in the leading part of mdat_l of the MP4 file 
extension part 250 is 4000 ms, the playback start time of the audio 
5 sample 1 stored immediately after the video sample 1 is 4050 ms, 
and the audio sample 1 that is placed after the playback start time 
of the video sample 1 and whose playback start time is the earliest 
is stored as an audio sample stored immediately after the video 
sample 1. 

10 Here, likewise as in t he case that has already been explained, 

the playback duration of a video sample is 500 ms, and the 
playback duration of an audio sample is 100 ms. 

Therefore, in mdat_l of the MP4 file extension part 250, 
audio samples 1 to 5 are interleaved and stored immediately after 

15 the video sample 1, after that, video sample 2, audio samples 6 to 
10 and video sample 3 are stored in sequence. 

At this time, the playback start time of the video sample 2 is 
4500ms, the playback start time of the audio sample 6 stored 
immediately after the video sample 2 is 4550 ms, and the playback 

20 start time of the video sample and playback start time of the audio 
sample immediately after the video sample are made to be the 
same or approximately the same as each other. 

Note that, as an audio sample stored immediately after the 
video sample here, an audio sample which is located before the 

25 playback start time of the video sample and whose playback start 
time is the last may be stored as an audio sample stored 
immediately after the video sample. In this case, the playback 
time of the audio sample 1 stored immediately after the video 
sample 1 is 3950ms. 

30 As explained up to this point, with the multiplexer in the 

third embodiment, as an audio sample whose playback start time is 
the same or approximately the same as the playback start time of 
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the video sample is placed immediately after the video sample, and 
video samples and audio samples are interleaved and stored in 
mdat in a way that their playback start times are arranged in an 
ascending order, it is possible to generate an MP4 file extension 
5 part which has a data structure that enables an immediate random 
access even in a playback apparatus which has a slow seek speed. 
Further, video and audio samples can be interleaved by the unit 
that consists of more than one samples sample . 

10 (Fourth Embodiment) 

Consequent l y, — the — A demultiplexer in the — a fourth 
embodiment of the present invention will be explained with 
reference to FIG. 26 and FIG. 27. 

FIG. 26 is a block diagram showing a functional structure of 

15 the demultiplexer in the fourth embodiment. 

The demultiplexer 300 is the apparatus that obtains and 
analyzes the MP4 file data including the MP4 file extension part 
generated by the multiplexer in the above-mentioned first, second 
and third embodiments, demultiplexes the media data and outputs 

20 the playback data, and includes a file input unit 301, a file data 
storage unit 302, a header demultiplex analysis e€H s t -unit 303, a 
moov analysis unit 304, a moof analysis unit 305, a traf analysis 
unit 306, a trun analysis unit 307, an RA searching unit 308 and a 
sample obtainment unit 309. 

25 The file input unit 301 is an interface that obtains an MP4 file 

data and stores the input data items in the obtained MP4 file in the 
file data storage unit 302 in this sequence. 

The file data storage unit 302 is a cache memory, a RAM or 
the like that temporally stores the MP4 input data. 

30 The header demultiplex analysis unit 303 is the processing 

unit that reads out and analyzes the header data in the MP4 file 
among the MP4 input data items stored in the file data storage unit 
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302, demultiplexes moov data of the basic part header in the MP4 
file from moof data of the extension part and outputs them to the 
moov analysis unit 304 and the moof analysis part 305 respectively, 
and is realized in a form of a CPU or a memory. 
5 The moov analysis unit 304 is the processing unit that 

analyzes moov of the MP4 file and obtains the media information 
necessary for analyzing the media data such as the coding rate of 
the media data or the playback time of a content, and is realized in 
a form of a CPU or a memory. This moov analysis unit outputs the 

10 obtained media information to moof analysis unit 305. 

The moof analysis unit 305 is the processing unit that 
analyzes moof of the MP4 file based on the media information 
obtained from moov analysis unit 304 and outputs traf data that is 
the header data for each track to traf analysis unit 306, and is 

15 realized in a form of a CPU or a memory. 

The traf analysis unit 306 is the processing unit that 
analyzes traf of the MP4 file and outputs trun data that is the 
header data for each sample included in traf to trun analysis unit 
307, and is realized in a form of a CPU or a memory. 

20 The trun analysis unit 307 is the processing unit that 

analyzes trun of the MP4 file, obtains the information described in 
each field of trun, and outputs trun analysis information to the 
sample obtainment unit 309, and is realized in a form of a CPU or 
a memory. This trun analysis information includes, such as, a 

25 sample size, data offset information indicating which part of the file 
data storage unit 302 the sample is stored, and in the case of a 
video sample, flag information indicating whether it is an intra 
frame or not and the like. 

Also, on obtaining, from the RA searching unit 308 that is 

30 explained next, a playback start instruction that shows a playback 
start position after random access and instructs the start of 
playback, fefris -the trun analysis unit 307 analyzes truns starting 
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from the trun shown by the playback start instruction in this 
sequence and outputs trun analysis information to the sample 
obtainment unit 309. 

The RA searching unit 308 is the processing unit that obtains 
5 a target playback time information showing the playback start time 
after random access, reads out the leading sample information that 
is the information indicating the playback start time of the leading 
sample included in the leading trun in the leading traf that stores 
the header information concerning the video track and searches 

10 the video sample that is the playback start position after random 
access, and is realized in a form of a CPU or a memory. On 
obtaining the target playback time information from the input 
apparatus of the demultiplexer 300 that receives a random access 
instruction from a user, this RA searching unit 308 obtains only the 

15 leading sample information from trun analysis unit 307 in this 
sequence, searches a video sample whose playback start time is 
the same or approximately the same as the target playback time 
information and outputs a playback start instruction to trun 
analysis unit 307. 

20 The sample obtainment unit 309 is the processing unit that 

reads out and decodes the entity data of a sample based on trun 
analysis information and outputs the playback data to a display 
apparatus such as a display. On obtaining trun analysis 
information from trun analysis unit 307, this sample obtainment 

25 unit 309 refers to data offset information included in this and reads 
the entity data of a sample from the file data storage unit 302. 
H ere, thcl he start of obtaining trun analysis information means 
that the start of playback is instructed. 

The operation of random access processing in the 

30 demultiplexer 300 constituted like this will be explained with 
reference to FIG. 27. 

FIG. 27 is a flow chart showing the operation of random 
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access processing of the demultiplexer 300. Note that the 
demultiplexer 300 receives a random access instruction from a 
user via an input apparatus prior to this flow. 

First, on obtaining data items of the MP4 file generated in 
5 the multiplexer in the above-mentioned first, second and third 
embodiments in the file input unit 301 (S400), the demultiplexer 
300 stores the data items in the file data storage unit 302 in this 
sequence. 

Next, the demultiplexer 300 demultiplexes and analyzes 

10 only the file header part of the MP4 file in the header demultiplex 
and analysis unit 303 (S410), further demultiplexes the basic part 
header from the extension part header, analyzes the basic part 
header in moov analysis unit 304 and analyzes the extension part 
in moof analysis unit 305 (S420). 

15 Consequently, the demultiplexer 300 further demultiplexes 

the extension header into headers for each track in moof analysis 
unit 305, and analyzes the track fragment that is traf in traf 
analysis unit 306 (S430). At this time, the demultiplexer 300 
further demultiplexes the track fragment in traf analysis unit 306 

20 and analyzes trun in trun analysis unit 307. 

Here, in response to the input of target playback time 
information in RA searching unit 308, the demultiplexer 300 
outputs the leading sample information from the trun analysis unit 
307 to the RA searching unit 308 and judges whether it is the 

25 leading sample information whose playback start time is the same 
or approximately the same as the one shown by the target 
playback time information or not in RA searching unit 308 (S440). 

At this time, in the case where no target sample is found (No 
in S450), the demultiplexer 300 obtains leading sample 

30 information in the extension part header that is located next in 
storing sequence in a file in the RA searching unit 308 and judges 
whether it is the leading sample information whose playback start 
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time is the same or approximately the same as the target playback 
time information that has already been obtained or not (S440). 

On the other hand, in the case where a target sample is 
found (Yes in S450), the demultiplexer 300 generates a playback 
5 start instruction in the RA searching unit 308 and outputs it to trun 
analysis unit 307. On receiving a playback start instruction from 
the RA searching unit 308, the trun analysis unit 307 outputs trun 
analysis information to the sample obtainment unit 309 starting 
from the trun to which a playback start instruction is given. Here, 

10 the trun to which a playback start instruction is given indicates the 
trun including a sample for which playback start is indicated in the 
RA searching unit 308. 

After that, the demultiplexer 300 refers to the data offset 
information included in trun analysis information in the sample 

15 obtainment unit 309, obtains the entity data of the target sample 
from the file data storage unit 302 (S460), decodes the data, and 
outputs the playback data so as to finish the operation of random 
access processing. 

As explained up to this point, with the demultiplexer 300 m 

20 of the fourth embodiment, searching only a video sample stored in 
the leading part of each packet at the time of performing random 
access playback in the MP4 file including the MP4 file extension unit 
generated by the multiplexer in the above-mentioned first, second 
and third embodiments makes it possible to judge the video sample 

25 that should be the playback start position after random access , and 
feJws -. Thus, the workload in searching samples at the time of 
random access is widely reduced. 

(Application) 

30 Here, an application of the multiplexer in the present 

invention will be explained with reference to FIG. 28. 

FIG. 28 is a diagram showing an application of the 
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multiplexer in the present invention. 

The multiplexer in the present invention may be applied for 
a mobile telephone with a recording function or a personal 
computer 404 that obtains and multiplexes media data such as 
5 video data, audio data or the like and generates MP4 file data. 
Also, the demultiplexer in the present invention may be applied for 
a mobile telephone 407 that reads the generated MP4 file data and 
plays it back. 

Here, — febeThe MP4 file data generated in the mobile 
10 telephone with a recording function 403 and the personal computer 
404 are stored in a recording medium such as an SD memory card 
405, a DVD-RAM 406 or the like or sent to the image distribution 
server 401 via the communication network 402 so as to be 
distributed from the image distribution server 401 to the mobile 
15 telephone 407 or the like. 

In this way, the multiplexer and the demultiplexer in the 
present invention are used for an MP4 file generation apparatus or 
a playback apparatus in the image distribution system or the like. 
Up to this point, the multiplexer and the demultiplexer in the 
20 present invention have already been explained based on the 
respective embodiments and the like, this present invention is not 
limited to these embodiments and the like. 

For example, coded data of MPEG-4 visual is used as video 
data in the above-mentioned embodiments, but coded data on 
25 which other video compression coding method such as MPEG-4 
advanced video coding (AVC), H. 263 or the like may be used. 
Note that a single picture corresponds to a single sample in the 
coded data of MPEG-4 Advanced video coding (AVC) or H. 263. 

Likewise, coded data of MPEG-4 audio is used as audio data, 
30 but coded data on which other audio compression coding method 
methods such as G. 726 may be used as audio data. 

Also, in the explanation made in the above-mentioned 
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embodiment embodiments . video data and audio data are used, but 
it is possible to obtain the efficiency of the present invention by 
processing the audio data li kew i se as in the case of packetization 
even in the case where text data and the like are included. 
5 Further, in the above-mentioned second embodiment, it is 

possible to omit the time-based part adjustment unit 120 from the 
units of the packetization part determination unit 117 and omits 
omit processing of step S205 in FIG. 18 in the case where 
packetization is performed for each intra frame. 

10 Also, in the above-mentioned third embodiment, in the case 

where the MP4 file is played back according to the buffer model 
that is previously set at a playback apparatus of the MP4 file, video 
sample data and audio sample data are interleaved and stored in 
mdat so that the buffer model is satisfied. H ere, oA buffer model 

15 is a model for guaranteeing that a playback apparatus can perform 
decoding preventing the buffer from becoming empty (underflow) 
or preventing data from overflowing the buffer (overflow) by 
causing the playback apparatus to have a buffer whose size is 
prescribed in a standard in the case where coded data are inputted 

20 according to conditions prescribed in the standard. 

Also, in the above-mentioned first, second and third 
embodiments, the number of trafs stored in moof of the extension 
part of the MP4 file to be generated is not mentioned, but it is 
preferred that traf to be stored in moof stores a single traf per a 

25 single track. This makes it possible to obtain header information 
of samples of all the tracks to be stored in moof by analyzing only 
leading traf in moof track by track, and thus the efficiency at the 
time of obtaining header information further improves. 

Further, in the above-mentioned first, second and third 

30 embodiments, the entity data of samples whose header 
information items are stored in moof of the extension part of the 
MP4 file to be generated are stored in a single mdat next to moof, 
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but it is possible to divide the sample into a plural mdat next to 
moof and store them. Specifically explaining, the entity data 
items of samples whose header information items may be stored in 
moof_l are stored in mdat_l, mdat_2 and mdat_3 in this sequence 
5 and the entity data items of samples whose header information 
items may be stored in moof_2 are stored in mdat_4, mdat_5 and 
mdat_6 in this sequence. 

After that, in the case where an intra frame of video data is 
included in the packet, the intra frame should be placed in the 

10 leading part of the packet in the above-mentioned second and third 
embodiments, but it is possible to place the video sample other 
than an intra frame such as a predictive (P) frame, a bidirectionally 
predictive (B) frame or the like in the leading part of the packet on 
condition that they are randomly accessible. This will be 

15 explained taking the case where coded data of MPEG-4 AVC are 
used as video data below as an example. 

In MPEG-4 AVC, there is a case where no right decoding 
result is obtained even in the case of decoding from an intra picture. 
More specifically, there are two types of intra pictures of MPEG-4 

20 AVC: an instantaneous decoder refresh (IDR) picture and other 
pictures (called a non-IDR intra picture). It is possible to always 
obtain a right decoding result when starting decoding from an IDR 
picture, but right decoding result may not be obtained in the case 
of a non-IDR intra picture and a plurality of pictures after the 

25 non-IDR intra picture in display order. 

Therefore, in MPEG-4 AVC, it is possible to add recovery 
point supplemental enhancement information called "recovery 
point SEI" indicating from which picture decoding should be started 
in order to obtain a right decoding result from the non-IDR intra 

30 picture. 

For example, five pictures indicated as Pic_l, Pic_2, Pic_3, 
Pic_4, Pic_5 are included in the video data in this sequence. When 
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trying to decode Pic_5 and pictures after Pic_5 in display sequence 
on condition that Pic_5 is a non-IDR intra picture, in the case where 
the decoding must be started from Pic_l, placing recovery point 
SEI at immediately before Pic_l makes it possible to indicate that 
5 the decoding must be started from Pic_l in order to decode Pic_5 
that is the picture placed four pictures later in storage order in the 
video data and pictures after the Pic_5 in display sequence. 

In other words, Pic_l is a randomly-accessible sample in this 
case, in the case of coded data of MPEG-4 AVC, it is possible to 

10 place a sample of the IDR picture or the picture to which recovery 
point SEI is added in the leading part of the packet as a 
randomly-accessible sample. Further, random-accessible 
samples that do not have recovery point SEI can be the leading 
sample in a packet. Note that the recovery point SEI can be added 

15 to the picture other than an intra picture. 

At this time, it is possible to reduce the processing amount 
at the time of obtaining sample data by storing a sample of the 
picture to which recovery point SEI is added and a sample of the 
picture that can be decoded right for the first time after starting 

20 decoding from the picture to which recovery point SEI is added. 

Further, it is possible to identify the IDR picture from the 
sample of the picture to which recovery point SEI is added based on 
the leading sample flag 930 or a specific flag value in the sample 
flag 935 (called as nonsynchronous sample flag). In the MP4, it is 

25 possible to set, at 0, the nonsynchronous sample flag of only the 
sample, among randomly-accessible samples, on which random 
access is allowed is the sample which is correctly decoded. 
Therefore, it is possible to identify fche -them both by making the 
nonsynchronous sample flag as 0 in the sample of the IDR picture 

30 and making the nonsynchronous sample flag as 1 in the sample of 
the picture to which recovery point SEI is added. 

By using an identification method like the above, it is 
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possible to differentiate randomly-accessible samples from each 
other based on nature. In reality, it can be used like below. 

First case is to perform forwarding by playing back only 
specific samples. At this time, as it is desirable that the decoded 
5 samples can immediately be displayed, only samples whose 
nonsynchronous sample flag is 0 are decoded and played back. 

Second case is to start playing back from the middle of the 
content or next area by skipping specific areas. At this time, only 
in the case of starting playing back, the sample from which 
10 decoding is started may differ from the sample which is correctly 
decoded. Therefore, either from the sample whose 
nonsynchronous sample flag is 0 or from the randomly-accessible 
sample whose nonsynchronous sample flag is 1 playback can be 
started. 

15 Note that this storage method is not limited to the case of 

recovery point SEI of MPEG-AVC, it is applicable for the case where 
the sample from which decoding is started differs from the sample 
which is correctly decoded. For example, it can be applicable for 
the structure such as Open GOP (Group of Pictures) MPEG-2 video. 

20 Further, in the case where identification information 

indicating that the sample is randomly accessible, it is possible to 
place the sample identified as randomly accessible by the 
identification information in the leading part of the packet. 

25 Industrial Applicability 

The multiplexer in the present invention is suitable for a 
digital video camera, a mobile phone with a recording function or 
the like that generates an MP4 file data by obtaining media data 
such as video data or audio data and stores it in a recording 
30 medium, or a personal computer, a PDA or the like that distributes 
the generated MP4 file data via the Internet, and the demultiplexer 
in the present invention is suitable for a personal computer, a 
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mobile phone or the like that downloads and plays back the MP4 file 
data. 
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ABSTRACT 

The multiplexer (100) includes a first input unit (101) that 
obta i ns video data , a second input unit (10 4 ) that obtains audio 
date, a first analysis unit (103) that ana l yzes video data and 
5 obta i ns video samp l e header i nformation , and a second analysis 
unit (10 6 ) that analyzes audio data and obtains aud i o sample 
header informat i on, . Moreover, the multiplexer includes a 
packetization part determination unit (107) t hat determines the 
packetization part of the-audio data in a way that the packetization 

10 part is made to be the same or approximately the same as the-a 
playback start time of fcbe-a_video sample that is placed in tbe-a 
leading part of the packetization part of the-video data after 
determining the packetization part of the video data based on the 
video sample header information — Furthermore, the multiplexer 

15 includes a packet header part generation unit (112) that generates 
the a packet header part on the basis of the determined 
packetization part, a packet data generation unit (113) that 
generates the-a_packet data unit on the basis of the determined 
packetization part and a packet connection unit (11 4 ) that 

20 generates a packet by connecting the generated packet header 
part to the pocket data part . 
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